Parallelization of While-Loops in Nested Loop Programs for Real-time Multiprocessor Systems
نویسندگان
چکیده
Many applications with stream processing behavior contain one or more loops with an unknown number of iterations. These loops have to be parallelized in order to utilize the maximum capacity of an embedded multiprocessor platform and thus increase the total throughput. This thesis presents a method to automatically extract a parallel task graph based on function level parallelism from a sequential nested loop program with while-loops. The task graph does not rely on a fixed order of the execution of the tasks. The notion of a single assignment section is introduced, which is used in the parallelization approach. A single assignment section is less restrictive when programming the sequential nested loop programs than when demanding single assignment code. Instead of the complete program, only in a single assignment section single assignment must hold. This makes the parallelization of all while-loops possible. Communication between the extracted tasks is done via circular buffers. In a circular buffer reading and writing is only allowed inside a window. For each circular buffer a choice is made between a buffer with sliding or overlapping windows depending on the costs. In a circular buffer with sliding windows the read and write windows can not overlap while this is allowed in a circular buffer with overlapping windows. An analysis on the overhead of the circular buffer is performed and a comparison between circular buffers with sliding and overlapping windows is made. Synchronization for the used circular buffers is inserted into the task graph to ensure the same functional behavior as the sequential nested loop program. Sufficient buffer capacities for a deadlock free execution of the parallel task graph are determined using data flow analysis. For each while-loop a cyclo-static data flow model is derived based on the models for circular buffers with sliding and overlapping windows. The model for overlapping windows is also presented in this thesis. The models for the while-loops are combined such that buffer capacities can be determined for the complete nested loop program. An extension to cyclo-static data flow models is proposed such that the combining of the while-loops can be modeled. A DVB-T radio receiver where the user can switch channels after an undetermined amount of time illustrates the parallelization approach. By means of execution traces it is shown that a higher throughput can be achieved when the presented parallelization approach is used. An extension to circular buffers with sliding or overlapping windows is presented
منابع مشابه
Free Scheduling of General Nested Loops For Distributed Memory Architectures
The most extensive, in terms of time execution, part of a program is the nested loops. Loop parallelization involves two steps: First the time partitioning of the index space to achieve the minimum makespan, and second the efficient assignment of the concurrent partitions into the target parallel architecture. If distributed memory multiprocessor systems are used, overall performance is decline...
متن کاملJavaSpMT: A Speculative Thread Pipelining Parallelization Model for Java Programs
This paper presents a new approach to improve performance of Java programs by extending the superthreaded speculative execution model [14, 15] to exploit coarsegrained parallelism on a shared-memory multiprocessor system. The parallelization model, called Java Speculative MultiThreading (JavaSpMT), combines control speculation with run-time dependence checking to parallelize a wide variety of l...
متن کاملAffine Transformations for Communication Minimized Parallelization and Locality Optimization of Arbitrarily Nested Loop Sequences
A long running program often spends most of its time in nested loops. The polyhedral model provides powerful abstractions to optimize loop nests with regular accesses for parallel execution. Affine transformations in this model capture a complex sequence of execution-reordering loop transformations that improve performance by parallelization as well as better locality. Although a significant am...
متن کاملAutomatic parallelization of nested loop programs for non-manifest real-time stream processing applications
This thesis is concerned with the automatic parallelization of real-time stream processing applications, such that they can be executed on embedded multiprocessor systems. Stream processing applications can be encountered in the channel decoding and video decoding domain. These applications typically have real-time requirements. Important trends for stream processing applications are that they ...
متن کاملAffine Transformations for Communication Minimal Parallelization and Locality Optimization of Arbitrarily Nested Loop Sequences
A long running program often spends most of its time in nested loops. The polyhedral model provides powerful abstractions to optimize loop nests with regular accesses for parallel execution. Affine transformations in this model capture a complex sequence of execution-reordering loop transformations that improve performance by parallelization as well as better locality. Although a significant am...
متن کامل